Real-time Implementation of Melp Vocoder
نویسندگان
چکیده
A real-time implementation of a 2.4 Kbps mixed excitation linear prediction (MELP) speech-coding algorithm on a 20 MIPS TMS320C44 digital signal processor is described. The main emphases of the paper are on efficient coding and optimization strategies in computationally intensive modules by exploiting fast memories and registers of the C44 instruction set. Although the implementation is based on a specific type of DSP architecture, the optimization techniques described in the paper are applicable to many types of DSP platforms. Intelligibility of speech from the developed MELP vocoder has been measured for both quiet and noisy environments using DRT and A/B tests. Quality of the processed speech has also been compared using MOS. INTRODUCTION Because of bandwidth constraint, low bit-rate vocoders have gained increasing prominence in many digital voice communications systems including the Internet. The requirement of secure voice transmission by appropriate encryption and decryption has also prompted the widespread use of digital speech coding techniques in various military applications (Richard 1997). The FS1015 LPC-10 and the FS1016 CELP standards were adopted by the Federal Standard (USA) in 1984 and 1991 respectively. The FS-1015 LPC-10 (2.4 Kbps) vocoder makes use of a 10-order linear prediction model (Atal and Hanauer 1971, Makhoul 1975, Makhoul et al. 1978, Kwon and Goldberg 1984) to extract formant information from input speech. The linear prediction coefficients are then converted to reflection coefficients for quantization and transmission. The two voicing states, voiced and unvoiced, determine whether pulse or noise excitation is used to synthesize the output speech. However, vocoders based on classical linear prediction, while capable of achieving highly intelligible speech at low bit-rates, suffer from problems of “buzzy” speech and isolated tones and thumps as a result of occasional voicing errors. The CELP FS-1016 vocoder (4.8 Kbps) also uses a 10 order LPC analysis for a “short term prediction”. This is an analysis-by-synthesis (AbS) vocoder which contains a local decoder at the encoder end so that the synthesized speech is available for analysis. In this way, each subsystem is jointly optimized so that the overall synthetic speech produces minimum distortion. Owing to the exhaustive codebook search and filtering needed to maximize the match score, the computational load demand is very high and many alternative codebook structures and search methods exist as a result of intense research. 1 Assoc. Professor, School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798. Email: [email protected] 2 Postgraduate student, School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798. Journal of The Institution of Engineers, Singapore Vol. 44 Issue 3 2004 39 Real-time single chip versions of the 4.8 Kbps CELP have been reported, but they usually perform only a subset of the full algorithm (NCS 1991). Most recently in March 1997, the DoD (USA) has standardized a 2.4 Kbps mixed excitation linear prediction (MELP) vocoder (McCree and Barnwell 1995, McCree et al. 1996, McCree and Martin 1998) to replace both the FS-1015 and FS-1016 vocoders. The MELP vocoder comprises an encoder and a decoder. The MELP encoder is generally similar to a classical linear prediction encoder. In addition, it features conversion of linear prediction coefficients to line spectral frequencies for quantization, a third voicing state for jittery voiced frames, and five band-pass voicings for noise/pulse mixture control at the synthesizer. The synthesizer of the MELP decoder also includes adaptive enhancement filtering and a pulse dispersion filter that improves the match between synthesized and original voiced speech. The “buzziness” which is a common problem in classical LPC speech is removed by the noise mixture algorithm, and the thumps and tonal noises due to voicing errors are removed using the third voicing state. This paper describes the implementation of a MELP vocoder on a TMS320C44 (40MHz/20MIPs) (Texas Instruments 1996) DSP board with a Pentium PC host. The focus is on the optimization of computational intensive modules in order to achieve realtime performance as well as good speech quality. The digital signal processor features an instruction pipeline that makes it possible for 1 instruction/cycle execution. Its instruction set has circular addressing (expediting implementation of circular buffers for filters) and parallel multiply/add, and multiply/store type instructions. The C44 utilizes 3 kinds of memories: internal (2 accesses per instruction cycle), local (1 access per cycle), and global (1 access per cycle). The global memory can be shared between different C44 processors. In our application, the global memory is mainly used for large tables and arrays of constants; local memory is used mostly for program code, and the internal memory is used for variables that are accessed frequently (e.g. filter coefficients and FFT tables). MELP ENCODER The encoder is divided into an analyzer and a quantizer. Incoming speech is sampled at 8000 times a second at 16 bits per sample. Each sample is stored in a circular FIFO buffer and packed into a frame. Each frame is made of up of 180 samples (or lasts 22.5 ms under 8000Hz sampling frequency). The various parameters that are extracted from an input speech are the pitch, 10-order linear prediction coefficients, band-pass voicings, the gain and an aperiodic indicator. By buffering the previous frame of speech in memory, speech information in both the current and previous frames is available for analysis. This effectively makes the analysis frame size (overlapping past and current frames) much larger than the actual input speech frame. Using a larger speech frame enables a better study of drastic changes in speech characteristics. In addition, sudden and short abrupt changes in the speech can be analyzed in the context of larger numbers of past and future samples. However, buffering of speech in this manner must take delay into account. The analysis window sizes for determining the pitch, linear prediction coefficients, the gain and peakiness measures are different (Federal Information Processing Standards Journal of The Institution of Engineers, Singapore Vol. 44 Issue 3 2004 40 Publication 1998). For example, the pitch analysis window is made up of 160 samples from the past frame and another 160 samples in the current frame. The linear prediction analysis window is made up of the last 100 samples from the past frame and the first 100 samples in the current frame. Taking the pitch window as a reference, the delay is a total of 180+160 samples or about 1.89 frames. In addition, the encoder buffers the three most recent sets of parameters and transmits the middle set. These buffered parameters are used for the pitch and voicing corrections using the past, current and future frames. There is thus an additional frame of delay. The decoder later also buffers the parameters from the previous and current frames to perform interpolation for the current pitch period, which incurs an additional frame of delay. Thus there is a delay of about 3.89 frames or 87.5ms for the internal buffering requirements. Pitch and Voicing Analysis The pitch detection in MELP is carried out using autocorrelation analysis. The autocorrelation coefficients are given by (Federal Information Processing Standards Publication 1998) ) , ( ) 0 , 0 ( ) , 0 ( ) ( τ τ τ τ
منابع مشابه
Implementation of an enhanced fixed point variable bit-rate MELP vocoder on TMS320C549
In this paper, a fixed point Variable Bit-Rate (VBR) Mixed Excitation Linear Predictive Coding (MELP TM ) vocoder is presented. The VBR-MELP vocoder is also implemented on a TMS320C54x and it achieves virtually indistinguishable federal standard MELP quality at bit-rates between 1.0 to 1.6 kb/s. The backbone of VBRMELP vocoder is similar to that of federal standard MELP. It utilizes a novel sub...
متن کاملPerformance of the Federal Standard 2.4 kbps MELP Vocoder Over ATM Networks
This paper evaluates the performance of the federal standard 2.4 kbps MELP vocoder when used over ATM networks that are subject to cell loss. It begins by addressing the sensitivity of MELP to random deletion of frames; it quantifies the extent to which the quality of the reconstructed speech is affected by two factors – the rate at which frames are lost and the burstiness of that loss process....
متن کامل47.5 U.S. Federal Standard MELP Vocoder Tactical Performance Enhancement via MAP Error Correction
The United States government has developed a new Federal Standard 2400 bps vocoding algorithm called MELP Mixed Excitation Linear Prediction [1]. This vocoder has very acceptable voice quality under benign error-free channel conditions. However, when subjected to high error conditions as could be experienced in tactical vehicular operations, amelioration techniques may be employed which take ad...
متن کاملSpeech Coding at very low rate
This work is an introduction to research in the field of speech coding. The goal is to become familiar with the latest technical analysis and speech coding to get the developing of two coders operating at very low speed, respectively (2400 and 1200 bps). The choice of the encoding algorithm has been focused on the MELP (Mixed Excitation Linear Prediction), which has a flow rate ratio / interest...
متن کاملReal Time signal Transposition with envelope Preservation in the phase vocoder
The following article presents a new real time implementation of an iterative cepstrum based spectral envelope estimation technique that was originally published under the name true envelope. Because the original algorithm is hardly known outside Japan we will first describe the algorithm and compare it to the standard techniques, i.e. LPC and discrete cepstrum. The estimation properties are co...
متن کامل